Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Nat Commun ; 14(1): 8270, 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38092765

ABSTRACT

There is currently little information about the evolution of gene clusters, genome architectures and karyotypes in early branching animals. Slowly evolving anthozoan cnidarians can be particularly informative about the evolution of these genome features. Here we report chromosome-level genome assemblies of two related anthozoans, the sea anemones Nematostella vectensis and Scolanthus callimorphus. We find a robust set of 15 chromosomes with a clear one-to-one correspondence between the two species. Both genomes show chromosomal conservation, allowing us to reconstruct ancestral cnidarian and metazoan chromosomal blocks, consisting of at least 19 and 16 ancestral linkage groups, respectively. We show that, in contrast to Bilateria, the Hox and NK clusters of investigated cnidarians are largely disintegrated, despite the presence of staggered hox/gbx expression in Nematostella. This loss of microsynteny conservation may be facilitated by shorter distances between cis-regulatory sequences and their cognate transcriptional start sites. We find no clear evidence for topologically associated domains, suggesting fundamental differences in long-range gene regulation compared to vertebrates. These data suggest that large sets of ancestral metazoan genes have been retained in ancestral linkage groups of some extant lineages; yet, higher order gene regulation with associated 3D architecture may have evolved only after the cnidarian-bilaterian split.


Subject(s)
Sea Anemones , Animals , Sea Anemones/genetics , Phylogeny , Synteny/genetics , Gene Expression Regulation , Genome/genetics
2.
Nat Biotechnol ; 41(7): 1018-1025, 2023 Jul.
Article in English | MEDLINE | ID: mdl-36593407

ABSTRACT

Nanopore sequencers can select which DNA molecules to sequence, rejecting a molecule after analysis of a small initial part. Currently, selection is based on predetermined regions of interest that remain constant throughout an experiment. Sequencing efforts, thus, cannot be re-focused on molecules likely contributing most to experimental success. Here we present BOSS-RUNS, an algorithmic framework and software to generate dynamically updated decision strategies. We quantify uncertainty at each genome position with real-time updates from data already observed. For each DNA fragment, we decide whether the expected decrease in uncertainty that it would provide warrants fully sequencing it, thus optimizing information gain. BOSS-RUNS mitigates coverage bias between and within members of a microbial community, leading to improved variant calling; for example, low-coverage sites of a species at 1% abundance were reduced by 87.5%, with 12.5% more single-nucleotide polymorphisms detected. Such data-driven updates to molecule selection are applicable to many sequencing scenarios, such as enriching for regions with increased divergence or low coverage, reducing time-to-answer.


Subject(s)
Nanopore Sequencing , Nanopores , Research Design , Bayes Theorem , Genome , Software , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA
3.
PLoS Comput Biol ; 18(4): e1010056, 2022 04.
Article in English | MEDLINE | ID: mdl-35486906

ABSTRACT

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.


Subject(s)
COVID-19 , Pandemics , Algorithms , COVID-19/epidemiology , Computer Simulation , Evolution, Molecular , Humans , Phylogeny , SARS-CoV-2/genetics , Software
4.
bioRxiv ; 2021 Sep 23.
Article in English | MEDLINE | ID: mdl-33758852

ABSTRACT

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100,000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software is available from https://github.com/NicolaDM/phastSim and allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutatability models that we developed to more realistically represent SARS-CoV-2 genome evolution.

5.
Genome Biol Evol ; 12(11): 2139-2152, 2020 11 03.
Article in English | MEDLINE | ID: mdl-33210145

ABSTRACT

The P-element, one of the best understood eukaryotic transposable elements, spread in natural Drosophila melanogaster populations in the last century. It invaded American populations first and later spread to the Old World. Inferring this invasion route was made possible by a unique resource available in D. melanogaster: Many strains sampled from different locations over the course of the last century. Here, we test the hypothesis that the invasion route of the P-element may be reconstructed from extant population samples using internal deletions (IDs) as markers. These IDs arise at a high rate when DNA transposons, such as the P-element, are active. We suggest that inferring invasion routes is possible as: 1) the fraction of IDs increases in successively invaded populations, which also explains the striking differences in the ID content between American and European populations, and 2) successively invaded populations end up with similar sets of IDs. This approach allowed us to reconstruct the invasion route of the P-element with reasonable accuracy. Our approach also sheds light on the unknown timing of the invasion in African populations: We suggest that African populations were invaded after American but before European populations. Simulations of TE invasions in spatially distributed populations confirm that IDs may allow us to infer invasion routes. Our approach might be applicable to other DNA transposons in different host species.


Subject(s)
DNA Transposable Elements , Drosophila melanogaster/genetics , Gene Flow , Models, Genetic , Animals , Sequence Deletion
6.
PLoS Genet ; 16(11): e1009175, 2020 11.
Article in English | MEDLINE | ID: mdl-33206635

ABSTRACT

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.


Subject(s)
Genome, Viral/genetics , Phylogeny , SARS-CoV-2/genetics , Algorithms , COVID-19 , Computational Biology , Evolution, Molecular , Humans , RNA, Viral/genetics , Sequence Alignment , Whole Genome Sequencing
7.
Mol Ecol Resour ; 19(5): 1346-1354, 2019 Sep.
Article in English | MEDLINE | ID: mdl-31056858

ABSTRACT

Transposable elements (TEs) are selfish DNA sequences that multiply within host genomes. They are present in most species investigated so far at varying degrees of abundance and sequence diversity. The TE composition may not only vary between but also within species and could have important biological implications. Variation in prevalence among populations may for example indicate a recent TE invasion, whereas sequence variation could indicate the presence of hyperactive or inactive forms. Gaining unbiased estimates of TE composition is thus vital for understanding the evolutionary dynamics of transposons. To this end, we developed DeviaTE, a tool to analyse and visualize TE abundance using Illumina or Sanger sequencing reads. Our tool requires sequencing reads of one or more samples (tissue, individual or population) and consensus sequences of TEs. It generates a table and a visual representation of TE composition. This allows for an intuitive assessment of coverage, sequence divergence, segregating SNPs and indels, as well as the presence of internal and terminal deletions. By contrasting the coverage between TEs and single copy genes, DeviaTE derives unbiased estimates of TE abundance. We show that naive approaches, which do not consider regions spanned by internal deletions, may substantially underestimate TE abundance. Using published data we demonstrate that DeviaTE can be used to study the TE composition within samples, identify clinal variation in TEs, compare TE diversity among species, and monitor TE invasions. Finally we present careful validations with publicly available and simulated data. DeviaTE is implemented in Python and distributed under the GPLv3 (https://github.com/W-L/deviaTE).


Subject(s)
Computational Biology/methods , Interspersed Repetitive Sequences , Sequence Analysis, DNA/methods , Software
8.
Nat Methods ; 15(6): 469, 2018 06.
Article in English | MEDLINE | ID: mdl-29786093

ABSTRACT

In the version of this Brief Communication originally published online, ref. 21 included details for a conference paper (Pegard, N. C. et al. Paper presented at Novel Techniques in Microscopy: Optics in the Life Sciences, Vancouver, BC, Canada, 12-15 April 2015). The correct reference is the following: Pégard, N. C. et al. Optica 3, 517-524 (2016). This error has been corrected in the print, HTML and PDF versions of the paper.

9.
Nat Methods ; 15(6): 429-432, 2018 06.
Article in English | MEDLINE | ID: mdl-29736000

ABSTRACT

Thus far, optical recording of neuronal activity in freely behaving animals has been limited to a thin axial range. We present a head-mounted miniaturized light-field microscope (MiniLFM) capable of capturing neuronal network activity within a volume of 700 × 600 × 360 µm3 at 16 Hz in the hippocampus of freely moving mice. We demonstrate that neurons separated by as little as ~15 µm and at depths up to 360 µm can be discriminated.


Subject(s)
Hippocampus/cytology , Hippocampus/physiology , Miniaturization/instrumentation , Neurons/physiology , Animals , Intravital Microscopy/instrumentation , Intravital Microscopy/methods , Mice , Optical Imaging/instrumentation , Optical Imaging/methods
10.
Nat Methods ; 14(8): 811-818, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28650477

ABSTRACT

Light-field microscopy (LFM) is a scalable approach for volumetric Ca2+ imaging with high volumetric acquisition rates (up to 100 Hz). Although the technology has enabled whole-brain Ca2+ imaging in semi-transparent specimens, tissue scattering has limited its application in the rodent brain. We introduce seeded iterative demixing (SID), a computational source-extraction technique that extends LFM to the mammalian cortex. SID can capture neuronal dynamics in vivo within a volume of 900 × 900 × 260 µm located as deep as 380 µm in the mouse cortex or hippocampus at a 30-Hz volume rate while discriminating signals from neurons as close as 20 µm apart, at a computational cost three orders of magnitude less than that of frame-by-frame image reconstruction. We expect that the simplicity and scalability of LFM, coupled with the performance of SID, will open up a range of applications including closed-loop experiments.


Subject(s)
Brain Mapping/methods , Calcium Signaling/physiology , Image Interpretation, Computer-Assisted/methods , Microscopy, Video/methods , Molecular Imaging/methods , Neurons/physiology , Algorithms , Animals , Female , Male , Mice , Mice, Inbred C57BL , Neurons/cytology , Nimodipine , Zebrafish
SELECTION OF CITATIONS
SEARCH DETAIL